AITopics | cnn encoder

Country:

Asia > China > Hong Kong (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
(2 more...)

Neural Information Processing SystemsDec-23-2025, 21:03:26 GMT

Revitalizing CNN Attention via Transformers in Self-Supervised Visual Representation Learning

Studies on self-supervised visual representation learning (SSL) improve encoder backbones to discriminate training samples without labels. While CNN encoders via SSL achieve comparable recognition performance to those via supervised learning, their network attention is under-explored for further improvement. Motivated by the transformers that explore visual attention effectively in recognition scenarios, we propose a CNN Attention REvitalization (CARE) framework to train attentive CNN encoders guided by transformers in SSL. The proposed CARE framework consists of a CNN stream (C-stream) and a transformer stream (T-stream), where each stream contains two branches. C-stream follows an existing SSL framework with two CNN encoders, two projectors, and a predictor.

cnn encoder, revitalizing cnn attention, self-supervised visual representation learning, (8 more...)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.56)
Information Technology > Artificial Intelligence > Vision (0.36)

Neural Information Processing SystemsOct-2-2025, 20:42:56 GMT

21be992eb8016e541a15953eee90760e-Paper.pdf

artificial intelligence, machine learning, natural language, (12 more...)

Country:

Asia > China > Hong Kong (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
(2 more...)

Park, Seongwan, Woo, Jieun, Yang, Siheon

MIS-LSTM: Multichannel Image-Sequence LSTM for Sleep Quality and Stress Prediction

arXiv.org Artificial IntelligenceSep-16-2025

This paper presents MIS-LSTM, a hybrid framework that joins CNN encoders with an LSTM sequence model for sleep quality and stress prediction at the day level from multimodal lifelog data. Continuous sensor streams are first partitioned into N-hour blocks and rendered as multi-channel images, while sparse discrete events are encoded with a dedicated 1D-CNN. A Convolutional Block Attention Module fuses the two modalities into refined block embeddings, which an LSTM then aggregates to capture long-range temporal dependencies. To further boost robustness, we introduce UALRE, an uncertainty-aware ensemble that overrides lowconfidence majority votes with high-confidence individual predictions. Experiments on the 2025 ETRI Lifelog Challenge dataset show that Our base MISLSTM achieves Macro-F1 0.615; with the UALRE ensemble, the score improves to 0.647, outperforming strong LSTM, 1D-CNN, and CNN baselines. Ablations confirm (i) the superiority of multi-channel over stacked-vertical imaging, (ii) the benefit of a 4-hour block granularity, and (iii) the efficacy of modality-specific discrete encoding.

artificial intelligence, machine learning, representation, (19 more...)

2509.11232

Country:

Europe > Austria > Vienna (0.14)
North America > United States > Hawaii > Honolulu County > Honolulu (0.04)
North America > United States > California > Los Angeles County > Long Beach (0.04)

Genre: Research Report (0.64)

Industry:

Health & Medicine > Therapeutic Area (1.00)
Health & Medicine > Consumer Health (0.68)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Neural Information Processing SystemsOct-9-2024, 18:31:29 GMT

Revitalizing CNN Attention via Transformers in Self-Supervised Visual Representation Learning

Studies on self-supervised visual representation learning (SSL) improve encoder backbones to discriminate training samples without labels. While CNN encoders via SSL achieve comparable recognition performance to those via supervised learning, their network attention is under-explored for further improvement. Motivated by the transformers that explore visual attention effectively in recognition scenarios, we propose a CNN Attention REvitalization (CARE) framework to train attentive CNN encoders guided by transformers in SSL. The proposed CARE framework consists of a CNN stream (C-stream) and a transformer stream (T-stream), where each stream contains two branches. C-stream follows an existing SSL framework with two CNN encoders, two projectors, and a predictor.

cnn encoder, revitalizing cnn attention, self-supervised visual representation learning, (9 more...)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.59)
Information Technology > Artificial Intelligence > Vision (0.39)

Udayantha, Dinuka Sandun, Weerasinghe, Kavindu, Wickramasinghe, Nima, Abeyratne, Akila, Wickremasinghe, Kithmin, Wanigasinghe, Jithangi, De Silva, Anjula, Edussooriya, Chamira

Using Explainable AI for EEG-based Reduced Montage Neonatal Seizure Detection

arXiv.org Artificial IntelligenceJun-4-2024

The neonatal period is the most vulnerable time for the development of seizures. Seizures in the immature brain lead to detrimental consequences, therefore require early diagnosis. The gold-standard for neonatal seizure detection currently relies on continuous video-EEG monitoring; which involves recording multi-channel electroencephalogram (EEG) alongside real-time video monitoring within a neonatal intensive care unit (NICU). However, video-EEG monitoring technology requires clinical expertise and is often limited to technologically advanced and resourceful settings. Cost-effective new techniques could help the medical fraternity make an accurate diagnosis and advocate treatment without delay. In this work, a novel explainable deep learning model to automate the neonatal seizure detection process with a reduced EEG montage is proposed, which employs convolutional nets, graph attention layers, and fully connected layers. Beyond its ability to detect seizures in real-time with a reduced montage, this model offers the unique advantage of real-time interpretability. By evaluating the performance on the Zenodo dataset with 10-fold cross-validation, the presented model achieves an absolute improvement of 8.31% and 42.86% in area under curve (AUC) and recall, respectively.

detection, seizure, seizure detection, (15 more...)

2406.16908

Country:

Europe > Finland > Uusimaa > Helsinki (0.04)
Asia > Sri Lanka > Western Province > Colombo > Colombo (0.04)
North America > Canada > British Columbia (0.04)
Europe > Italy > Emilia-Romagna > Parma Province (0.04)

Genre: Research Report (0.64)

Industry: Health & Medicine > Therapeutic Area > Pediatrics/Neonatology (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Solomon, Enoch, Woubie, Abraham, Emiru, Eyael Solomon

Deep Learning Based Face Recognition Method using Siamese Network

arXiv.org Artificial IntelligenceFeb-8-2024

Achieving state-of-the-art results in face verification systems typically hinges on the availability of labeled face training data, a resource that often proves challenging to acquire in substantial quantities. In this research endeavor, we proposed employing Siamese networks for face recognition, eliminating the need for labeled face images. We achieve this by strategically leveraging negative samples alongside nearest neighbor counterparts, thereby establishing positive and negative pairs through an unsupervised methodology. The architectural framework adopts a VGG encoder, trained as a double branch siamese network. Our primary aim is to circumvent the necessity for labeled face image data, thus proposing the generation of training pairs in an entirely unsupervised manner. Positive training data are selected within a dataset based on their highest cosine similarity scores with a designated anchor, while negative training data are culled in a parallel fashion, though drawn from an alternate dataset. During training, the proposed siamese network conducts binary classification via cross-entropy loss. Subsequently, during the testing phase, we directly extract face verification scores from the network's output layer. Experimental results reveal that the proposed unsupervised system delivers a performance on par with a similar but fully supervised baseline.

dataset, face image, recognition, (14 more...)

2312.14001

Country:

North America > United States > Washington > King County > Seattle (0.04)
North America > United States > Virginia > Richmond (0.04)
North America > United States > Utah > Salt Lake County > Salt Lake City (0.04)
(6 more...)

Genre: Research Report > New Finding (0.68)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision > Face Recognition (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Chen, Wei, Rojas, Nicolas

TraKDis: A Transformer-based Knowledge Distillation Approach for Visual Reinforcement Learning with Application to Cloth Manipulation

arXiv.org Artificial IntelligenceJan-24-2024

Approaching robotic cloth manipulation using reinforcement learning based on visual feedback is appealing as robot perception and control can be learned simultaneously. However, major challenges result due to the intricate dynamics of cloth and the high dimensionality of the corresponding states, what shadows the practicality of the idea. To tackle these issues, we propose TraKDis, a novel Transformer-based Knowledge Distillation approach that decomposes the visual reinforcement learning problem into two distinct stages. In the first stage, a privileged agent is trained, which possesses complete knowledge of the cloth state information. This privileged agent acts as a teacher, providing valuable guidance and training signals for subsequent stages. The second stage involves a knowledge distillation procedure, where the knowledge acquired by the privileged agent is transferred to a vision-based agent by leveraging pre-trained state estimation and weight initialization. TraKDis demonstrates better performance when compared to state-of-the-art RL techniques, showing a higher performance of 21.9%, 13.8%, and 8.3% in cloth folding tasks in simulation. Furthermore, to validate robustness, we evaluate the agent in a noisy environment; the results indicate its ability to handle and adapt to environmental uncertainties effectively. Real robot experiments are also conducted to showcase the efficiency of our method in real-world scenarios.

agent, information, manipulation, (15 more...)

2401.13362

Country:

Europe > United Kingdom (0.04)
Asia > China (0.04)

Genre: Research Report (1.00)

Industry: Education (1.00)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Radfar, Martin, Barnwal, Rohit, Swaminathan, Rupak Vignesh, Chang, Feng-Ju, Strimel, Grant P., Susanj, Nathan, Mouchtaris, Athanasios

ConvRNN-T: Convolutional Augmented Recurrent Neural Network Transducers for Streaming Speech Recognition

arXiv.org Artificial IntelligenceSep-29-2022

The recurrent neural network transducer (RNN-T) is a prominent streaming end-to-end (E2E) ASR technology. In RNN-T, the acoustic encoder commonly consists of stacks of LSTMs. Very recently, as an alternative to LSTM layers, the Conformer architecture was introduced where the encoder of RNN-T is replaced with a modified Transformer encoder composed of convolutional layers at the frontend and between attention layers. In this paper, we introduce a new streaming ASR model, Convolutional Augmented Recurrent Neural Network Transducers (ConvRNN-T) in which we augment the LSTM-based RNN-T with a novel convolutional frontend consisting of local and global context CNN encoders. ConvRNN-T takes advantage of causal 1-D convolutional layers, squeeze-and-excitation, dilation, and residual blocks to provide both global and local audio context representation to LSTM layers. We show ConvRNN-T outperforms RNN-T, Conformer, and ContextNet on Librispeech and in-house data. In addition, ConvRNN-T offers less computational complexity compared to Conformer. ConvRNN-T's superior accuracy along with its low footprint make it a promising candidate for on-device streaming ASR technologies.

artificial intelligence, encoder, machine learning, (16 more...)

2209.14868

Country: North America > United States (0.04)

Genre: Research Report (0.65)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Ranjan, Rohit, Bhakta, Himadri, Jha, Animesh, Maheshwari, Parv, Chakravarty, Debashish

[Re] Differentiable Spatial Planning using Transformers

arXiv.org Artificial IntelligenceAug-19-2022

This report covers our reproduction effort of the paper 'Differentiable Spatial Planning using Transformers' by Chaplot et al. . In this paper, the problem of spatial path planning in a differentiable way is considered. They show that their proposed method of using Spatial Planning Transformers outperforms prior data-driven models and leverages differentiable structures to learn mapping without a ground truth map simultaneously. We verify these claims by reproducing their experiments and testing their method on new data. We also investigate the stability of planning accuracy with maps with increased obstacle complexity. Efforts to investigate and verify the learnings of the Mapper module were met with failure stemming from a paucity of computational resources and unreachable authors.

accuracy, dataset, differentiable spatial planning, (11 more...)

doi: 10.5281/zenodo.6475614

2208.09536

Country: Asia > India > West Bengal > Kharagpur (0.04)

Genre: Research Report > New Finding (0.47)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.90)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.69)